<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"><head><link rel="stylesheet" type="text/css" href="insn.css"/><meta name="generator" content="iform.xsl"/><title>BFMOPS (widening)</title></head><body><table style="margin: 0 auto;"><tr><td><div class="topbar"><a href="index.html">Base Instructions</a></div></td><td><div class="topbar"><a href="fpsimdindex.html">SIMD&amp;FP Instructions</a></div></td><td><div class="topbar"><a href="sveindex.html">SVE Instructions</a></div></td><td><div class="topbar"><a href="mortlachindex.html">SME Instructions</a></div></td><td><div class="topbar"><a href="encodingindex.html">Index by Encoding</a></div></td><td><div class="topbar"><a href="shared_pseudocode.html">Shared Pseudocode</a></div></td><td><div class="topbar"><a href="notice.html">Proprietary Notice</a></div></td></tr></table><hr/><h2 class="instruction-section">BFMOPS (widening)</h2><p>BFloat16 sum of outer products and subtract</p>
      <p class="aml">The BFloat16 floating-point sum of outer products and subtract instruction works with a 32-bit element ZA tile.</p>
      <p class="aml">This instruction multiplies the SVL<sub>S</sub>×2 sub-matrix of BFloat16 values held in the first source vector by the 2×SVL<sub>S</sub> sub-matrix of BFloat16 values in the second source vector.</p>
      <p class="aml">Each source vector is independently predicated by a corresponding governing predicate. When a 16-bit source element is Inactive it is treated as having the value +0.0, but if both pairs of source vector elements that correspond to a 32-bit destination element contain Inactive elements, then the destination element remains unmodified.</p>
      <p class="aml">The resulting SVL<sub>S</sub>×SVL<sub>S</sub> single-precision floating-point sum of outer products is then destructively subtracted from the single-precision floating-point destination tile. This is equivalent to performing a 2-way dot product and subtract from each of the destination tile elements.</p>
      <p class="aml">Each 32-bit container of the first source vector holds 2 consecutive column elements of each row of a SVL<sub>S</sub>×2 sub-matrix. Similarly, each 32-bit container of the second source vector holds 2 consecutive row elements of each column of a 2×SVL<sub>S</sub> sub-matrix.</p>
      <p class="aml">This instruction follows SME BFloat16 numerical behaviors.</p>
    
    <h3 class="classheading"><a id="iclass_mortlach"/>SME<span style="font-size:smaller;"><br/>(FEAT_SME)
          </span></h3><p class="desc"/><div class="regdiagram-32"><table class="regdiagram"><thead><tr><td>31</td><td>30</td><td>29</td><td>28</td><td>27</td><td>26</td><td>25</td><td>24</td><td>23</td><td>22</td><td>21</td><td>20</td><td>19</td><td>18</td><td>17</td><td>16</td><td>15</td><td>14</td><td>13</td><td>12</td><td>11</td><td>10</td><td>9</td><td>8</td><td>7</td><td>6</td><td>5</td><td>4</td><td>3</td><td>2</td><td>1</td><td>0</td></tr></thead><tbody><tr class="firstrow"><td class="l">1</td><td class="r">0</td><td class="l">0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>1</td><td>1</td><td>0</td><td class="r">0</td><td colspan="5" class="lr">Zm</td><td colspan="3" class="lr">Pm</td><td colspan="3" class="lr">Pn</td><td colspan="5" class="lr">Zn</td><td class="lr">1</td><td class="lr">0</td><td class="lr">0</td><td colspan="2" class="lr">ZAda</td></tr><tr class="secondrow"><td colspan="2"/><td colspan="9"/><td colspan="5"/><td colspan="3"/><td colspan="3"/><td colspan="5"/><td class="droppedname">S</td><td/><td/><td colspan="2"/></tr></tbody></table></div><div class="encoding"><h4 class="encoding"/><a id="bfmops_za32_pp_zz_"/><p class="asm-code">BFMOPS  <a href="#sa_zada" title="ZA tile ZA0-ZA3 (field &quot;ZAda&quot;)">&lt;ZAda&gt;</a>.S, <a href="#sa_pn" title="First governing scalable predicate register P0-P7 (field &quot;Pn&quot;)">&lt;Pn&gt;</a>/M, <a href="#sa_pm" title="Second governing scalable predicate register P0-P7 (field &quot;Pm&quot;)">&lt;Pm&gt;</a>/M, <a href="#sa_zn" title="First source scalable vector register (field &quot;Zn&quot;)">&lt;Zn&gt;</a>.H, <a href="#sa_zm" title="Second source scalable vector register (field &quot;Zm&quot;)">&lt;Zm&gt;</a>.H</p></div><p class="pseudocode">if !<a href="shared_pseudocode.html#impl-aarch64.HaveSME.0" title="function: boolean HaveSME()">HaveSME</a>() then UNDEFINED;
integer a = <a href="shared_pseudocode.html#impl-shared.UInt.1" title="function: integer UInt(bits(N) x)">UInt</a>(Pn);
integer b = <a href="shared_pseudocode.html#impl-shared.UInt.1" title="function: integer UInt(bits(N) x)">UInt</a>(Pm);
integer n = <a href="shared_pseudocode.html#impl-shared.UInt.1" title="function: integer UInt(bits(N) x)">UInt</a>(Zn);
integer m = <a href="shared_pseudocode.html#impl-shared.UInt.1" title="function: integer UInt(bits(N) x)">UInt</a>(Zm);
integer da = <a href="shared_pseudocode.html#impl-shared.UInt.1" title="function: integer UInt(bits(N) x)">UInt</a>(ZAda);
boolean sub_op = TRUE;</p>
  <div class="encoding-notes"/><h3 class="explanations">Assembler Symbols</h3><div class="explanations"><table><col class="asyn-l"/><col class="asyn-r"/><tr><td>&lt;ZAda&gt;</td><td><a id="sa_zada"/>
        
          <p class="aml">Is the name of the ZA tile ZA0-ZA3, encoded in the "ZAda" field.</p>
        
      </td></tr></table><table><col class="asyn-l"/><col class="asyn-r"/><tr><td>&lt;Pn&gt;</td><td><a id="sa_pn"/>
        
          <p class="aml">Is the name of the first governing scalable predicate register P0-P7, encoded in the "Pn" field.</p>
        
      </td></tr></table><table><col class="asyn-l"/><col class="asyn-r"/><tr><td>&lt;Pm&gt;</td><td><a id="sa_pm"/>
        
          <p class="aml">Is the name of the second governing scalable predicate register P0-P7, encoded in the "Pm" field.</p>
        
      </td></tr></table><table><col class="asyn-l"/><col class="asyn-r"/><tr><td>&lt;Zn&gt;</td><td><a id="sa_zn"/>
        
          <p class="aml">Is the name of the first source scalable vector register, encoded in the "Zn" field.</p>
        
      </td></tr></table><table><col class="asyn-l"/><col class="asyn-r"/><tr><td>&lt;Zm&gt;</td><td><a id="sa_zm"/>
        
          <p class="aml">Is the name of the second source scalable vector register, encoded in the "Zm" field.</p>
        
      </td></tr></table></div><div class="syntax-notes"/>
    <div class="ps"><a id="execute"/><h3 class="pseudocode">Operation</h3>
      <p class="pseudocode"><a href="shared_pseudocode.html#impl-aarch64.CheckStreamingSVEAndZAEnabled.0" title="function: CheckStreamingSVEAndZAEnabled()">CheckStreamingSVEAndZAEnabled</a>();
constant integer VL = <a href="shared_pseudocode.html#impl-aarch64.CurrentVL.read.none" title="accessor: integer CurrentVL">CurrentVL</a>;
constant integer PL = VL DIV 8;
constant integer dim = VL DIV 32;
bits(PL) mask1 = <a href="shared_pseudocode.html#impl-aarch64.P.read.2" title="accessor: bits(width) P[integer n, integer width]">P</a>[a, PL];
bits(PL) mask2 = <a href="shared_pseudocode.html#impl-aarch64.P.read.2" title="accessor: bits(width) P[integer n, integer width]">P</a>[b, PL];
bits(VL) operand1 = <a href="shared_pseudocode.html#impl-aarch64.Z.read.2" title="accessor: bits(width) Z[integer n, integer width]">Z</a>[n, VL];
bits(VL) operand2 = <a href="shared_pseudocode.html#impl-aarch64.Z.read.2" title="accessor: bits(width) Z[integer n, integer width]">Z</a>[m, VL];
bits(dim*dim*32) operand3 = <a href="shared_pseudocode.html#impl-aarch64.ZAtile.read.3" title="accessor: bits(width) ZAtile[integer tile, integer esize, integer width]">ZAtile</a>[da, 32, dim*dim*32];
bits(dim*dim*32) result;

for row = 0 to dim-1
    for col = 0 to dim-1
        // determine row/col predicates
        boolean prow_0 = (<a href="shared_pseudocode.html#impl-aarch64.ActivePredicateElement.3" title="function: boolean ActivePredicateElement(bits(N) pred, integer e, integer esize)">ActivePredicateElement</a>(mask1, 2*row + 0, 16));
        boolean prow_1 = (<a href="shared_pseudocode.html#impl-aarch64.ActivePredicateElement.3" title="function: boolean ActivePredicateElement(bits(N) pred, integer e, integer esize)">ActivePredicateElement</a>(mask1, 2*row + 1, 16));
        boolean pcol_0 = (<a href="shared_pseudocode.html#impl-aarch64.ActivePredicateElement.3" title="function: boolean ActivePredicateElement(bits(N) pred, integer e, integer esize)">ActivePredicateElement</a>(mask2, 2*col + 0, 16));
        boolean pcol_1 = (<a href="shared_pseudocode.html#impl-aarch64.ActivePredicateElement.3" title="function: boolean ActivePredicateElement(bits(N) pred, integer e, integer esize)">ActivePredicateElement</a>(mask2, 2*col + 1, 16));

        bits(32) sum = <a href="shared_pseudocode.html#impl-shared.Elem.read.3" title="accessor: bits(size) Elem[bits(N) vector, integer e, integer size]">Elem</a>[operand3, row*dim+col, 32];
        if (prow_0 &amp;&amp; pcol_0) || (prow_1 &amp;&amp; pcol_1) then
            bits(16) erow_0 = (if prow_0 then <a href="shared_pseudocode.html#impl-shared.Elem.read.3" title="accessor: bits(size) Elem[bits(N) vector, integer e, integer size]">Elem</a>[operand1, 2*row + 0, 16] else <a href="shared_pseudocode.html#impl-shared.FPZero.2" title="function: bits(N) FPZero(bit sign, integer N)">FPZero</a>('0', 16));
            bits(16) erow_1 = (if prow_1 then <a href="shared_pseudocode.html#impl-shared.Elem.read.3" title="accessor: bits(size) Elem[bits(N) vector, integer e, integer size]">Elem</a>[operand1, 2*row + 1, 16] else <a href="shared_pseudocode.html#impl-shared.FPZero.2" title="function: bits(N) FPZero(bit sign, integer N)">FPZero</a>('0', 16));
            bits(16) ecol_0 = (if pcol_0 then <a href="shared_pseudocode.html#impl-shared.Elem.read.3" title="accessor: bits(size) Elem[bits(N) vector, integer e, integer size]">Elem</a>[operand2, 2*col + 0, 16] else <a href="shared_pseudocode.html#impl-shared.FPZero.2" title="function: bits(N) FPZero(bit sign, integer N)">FPZero</a>('0', 16));
            bits(16) ecol_1 = (if pcol_1 then <a href="shared_pseudocode.html#impl-shared.Elem.read.3" title="accessor: bits(size) Elem[bits(N) vector, integer e, integer size]">Elem</a>[operand2, 2*col + 1, 16] else <a href="shared_pseudocode.html#impl-shared.FPZero.2" title="function: bits(N) FPZero(bit sign, integer N)">FPZero</a>('0', 16));
            if sub_op then
                boolean honor_altfp = FALSE;   // Alternate handling ignored
                if prow_0 then erow_0 = <a href="shared_pseudocode.html#impl-shared.BFNeg.2" title="function: bits(N) BFNeg(bits(N) op, boolean honor_altfp)">BFNeg</a>(erow_0, honor_altfp);
                if prow_1 then erow_1 = <a href="shared_pseudocode.html#impl-shared.BFNeg.2" title="function: bits(N) BFNeg(bits(N) op, boolean honor_altfp)">BFNeg</a>(erow_1, honor_altfp);
            sum = <a href="shared_pseudocode.html#impl-shared.BFDotAdd.6" title="function: bits(N) BFDotAdd(bits(N) addend, bits(N DIV 2) op1_a, bits(N DIV 2) op1_b, bits(N DIV 2) op2_a, bits(N DIV 2) op2_b, FPCRType fpcr_in)">BFDotAdd</a>(sum, erow_0, erow_1, ecol_0, ecol_1, FPCR[]);

        <a href="shared_pseudocode.html#impl-shared.Elem.write.3" title="accessor: Elem[bits(N) &amp;vector, integer e, integer size] = bits(size) value">Elem</a>[result, row*dim+col, 32] = sum;

<a href="shared_pseudocode.html#impl-aarch64.ZAtile.write.3" title="accessor: ZAtile[integer tile, integer esize, integer width] = bits(width) value">ZAtile</a>[da, 32, dim*dim*32] = result;</p>
    </div>
  <hr/><table style="margin: 0 auto;"><tr><td><div class="topbar"><a href="index.html">Base Instructions</a></div></td><td><div class="topbar"><a href="fpsimdindex.html">SIMD&amp;FP Instructions</a></div></td><td><div class="topbar"><a href="sveindex.html">SVE Instructions</a></div></td><td><div class="topbar"><a href="mortlachindex.html">SME Instructions</a></div></td><td><div class="topbar"><a href="encodingindex.html">Index by Encoding</a></div></td><td><div class="topbar"><a href="shared_pseudocode.html">Shared Pseudocode</a></div></td><td><div class="topbar"><a href="notice.html">Proprietary Notice</a></div></td></tr></table><p class="versions">
      Internal version only: isa v33.62, AdvSIMD v29.12, pseudocode v2023-03_rel, sve v2023-03_rc3b
      ; Build timestamp: 2023-03-31T11:36
    </p><p class="copyconf">
      Copyright © 2010-2023 Arm Limited or its affiliates. All rights reserved.
      This document is Non-Confidential.
    </p></body></html>
